All Questions
Tagged with deep-learningreinforcement-learning
172 questions
3votes
2answers
44views
Required background for thorough understanding of Causal ML research papers?
I'm interested in pursuing research in the intersection of causal inference and machine learning, particularly on causal discovery and causal representation learning. Through my exploration so far, I ...
2votes
1answer
84views
How to deal with actions that complete in multiple steps (delayed reward) in reinforcement learning?
I have been exploring RL and using DQN to train an agent for a problem where i have two possible actions. But one of the action is supposed to complete over multiple steps while other one is ...
1vote
1answer
51views
Can I use minimax tree search over Q-values?
I'm trying to build a chess bot, and I'm trying to figure out if I can use Q-Values in a search tree by creating new nodes according to the number of possible moves, each with the corresponding Q-...
2votes
2answers
425views
What is the reinforcement learning reward function for reasoning in DeepSeek-R1
DeepSeek-R1 reports to have applied the Group Relative Policy Optimization where it rewards "accuracy". How is this accuracy measured for theorem proving? A proof can be stated in myriad ...
0votes
0answers
27views
Is my MAML implementation correct?
im trying to implement the MAML algorithm in the Reinforcement Learning domain but am not achieving fast adaptation to my validation tasks. I assume that something may be wrong with my meta loss ...
0votes
0answers
39views
What’s the State of the Art in Traffic Light Control Using Reinforcement Learning? Ideas for Master’s Thesis?
I’m currently planning my Master’s thesis and I’m interested in the application of RL to traffic light control systems. I’ve come across research using different algorithms. However, I wanted to know: ...
1vote
1answer
53views
What type of noise should I use with softmax activation?
I'm implementing a RL agent that navigates a graph. I'm using a softmax activation in the final layer of the actor network to model the action probabilities. To encourage exploration during training, ...
0votes
1answer
22views
Unidentifiable flipped sign in policy gradient
Today I was building a VPG agent for a test and noticed it was getting worse not better over time so I flipped the reward during the training loop and lo and behold it learned. so obviously I started ...
1vote
1answer
77views
How do I correctly apply action masking during DDPG training in Python?
I'm implementing the Deep Deterministic Policy Gradient (DDPG) algorithm in PyTorch, and I'm facing issues with applying an action mask during the training process. Currently, I apply an action mask ...
0votes
3answers
56views
Why does TD3/DDPG use − 𝐸 [ 𝑄 ( 𝑠 , 𝜋 ( 𝑠 ) ) ] −E[Q(s,π(s))] as the policy loss without causing Q-values to go to infinity?
I tried to understand why TD3/DDPG use a policy loss of −E[Q(s,π(s))], which should make the policy maximize Q-values. I expected this to push Q-values to infinity over time, as there’s no explicit ...
3votes
1answer
287views
Can two different non-optimal policies have the same value functions?
According Sutton and Barto second edition, page 79, policy improvement must give a better policy except when the policy is already optimal. This means that if two policies have the same value function ...
1vote
1answer
135views
Is deep learning suitable/preferable for string similarity detection and application automation? If so, which type?
newbie here. I have developed an app that basically does: Perform OCR, check if words are contained in the resulting text and then perform an action. If no words are detected from the given list, ...
0votes
1answer
142views
Is reinforcement learning suitable for application automation?
I have basically automatised the use of an app through the use of OCR and computer vision. So basically when a word or an image is detected it will perform a certain action. When that action is ...
1vote
0answers
38views
Why completely two different algorithms are being used in Deep Q Learning?
I'm a new student in reinforcement learning. Recently, I've been studying about different algorithms of RL. But I'm quite surprized that there are some algorithms which are named as "same" ...
1vote
0answers
25views
Enhancing Generalization in DRL Agents in Static Data Environments
Context: I'm working with a deep reinforcement learning (DRL) agent in a market-like environment where its actions do not affect the environment. The environment uses historical data up to a certain ...